Size Regularized Cut for Data Clustering

نویسندگان

  • Yixin Chen
  • Ya Zhang
  • Xiang Ji
چکیده

We present a novel spectral clustering method that enables users to incorporate prior knowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is defined as the sum of the inter-cluster similarity and a regularization term measuring the relative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NP-complete. An approximation algorithm is proposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate that the method is not sensitive to outliers and performs better than normalized cut.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering of Defect Reports Using Graph Partitioning Algorithms

We present in this paper several solutions to the challenging task of clustering software defect reports. Clustering defect reports can be very useful for prioritizing the testing effort and to better understand the nature of software defects. Despite some challenges with the language used and semi-structured nature of defect reports, our experiments on data collected from the open source proje...

متن کامل

On Trivial Solution and Scale Transfer Problems in Graph Regularized NMF

Combining graph regularization with nonnegative matrix (tri-)factorization (NMF) has shown great performance improvement compared with traditional nonnegativematrix (tri-)factorizationmodels due to its ability to utilize the geometric structure of the documents and words. In this paper, we show that these models are not well-defined and suffering from trivial solution and scale transfer problem...

متن کامل

Repeated Record Ordering for Constrained Size Clustering

One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...

متن کامل

Subspace Clustering via Graph Regularized Sparse Coding

Sparse coding has gained popularity and interest due to the benefits of dealing with sparse data, mainly space and time efficiencies. It presents itself as an optimization problem with penalties to ensure sparsity. While this approach has been studied in the literature, it has rarely been explored within the confines of clustering data. It is our belief that graph-regularized sparse coding can ...

متن کامل

Laplacian regularized low rank subspace clustering

The problem of fitting a union of subspaces to a collection of data points drawn from multiple subspaces is considered in this paper. In the traditional low rank representation model, the dictionary used to represent the data points is chosen as the data points themselves and thus the dictionary is corrupted with noise. This problem is solved in the low rank subspace clustering model which deco...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005